AITopics

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Neural Information Processing SystemsJun-19-2026, 20:32:24 GMT

LayerNavigator: Finding Promising Intervention Layers for Efficient Activation Steering in Large Language Models

Activation steering is an efficient technique for aligning the behavior of large language models (LLMs) by injecting steering vectors directly into a model's residual stream during inference. A pivotal challenge in this approach lies in choosing the right layers to intervene, as inappropriate selection can undermine behavioral alignment and even impair the model's language fluency and other core capabilities. While single-layer steering allows straightforward evaluation on held-out data to identify the "best" layer, it offers only limited alignment improvements. Multi-layer steering promises stronger control but faces a combinatorial explosion of possible layer subsets, making exhaustive search impractical. To address these challenges, we propose LayerNavigator, which provides a principled and promising layer selection strategy. The core innovation of LayerNavigator lies in its novel, quantifiable criterion that evaluates each layer's steerability by jointly considering two key aspects: discriminability and consistency. By reusing the activations computed during steering vector generation, LayerNavigator requires no extra data and adds negligible overhead. Comprehensive experiments show that LayerNavigator achieves not only superior alignment but also greater scalability and interpretability compared to existing strategies.

large language model, machine learning, natural language, (21 more...)

Country:

Europe > Austria (0.28)
Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsFeb-14-2026, 00:25:47 GMT

58cbe393b4254da8966780a40d023c0b-Paper-Conference.pdf

large language model, machine learning, natural language, (22 more...)

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.04)
North America > United States > District of Columbia > Washington (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Government (1.00)
Media > News (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Chang, Tyler A., Bergen, Benjamin K.

Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models

arXiv.org Artificial IntelligenceDec-4-2025

large language model, machine learning, subnetwork, (20 more...)

2504.15471

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Sivakumar, Anushka, Zhang, Andrew, Hakim, Zaber, Thomas, Chris

SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models

arXiv.org Artificial IntelligenceOct-31-2025

This work introduces SteerVLM, a lightweight steering module designed to guide Vision-Language Models (VLMs) towards outputs that better adhere to desired instructions. Our approach learns from the latent embeddings of paired prompts encoding target and converse behaviors to dynamically adjust activations connecting the language modality with image context. This allows for fine-grained, inference-time control over complex output semantics without modifying model weights while preserving performance on off-target tasks. Our steering module requires learning parameters equal to 0.14% of the original VLM's size. Our steering module gains model control through dimension-wise activation modulation and adaptive steering across layers without requiring pre-extracted static vectors or manual tuning of intervention points. Furthermore, we introduce VNIA (Visual Narrative Intent Alignment), a multimodal dataset specifically created to facilitate the development and evaluation of VLM steering techniques. Our method outperforms existing intervention techniques on steering and hallucination mitigation benchmarks for VLMs and proposes a robust solution for multimodal model control through activation engineering.

large language model, machine learning, natural language, (19 more...)

2510.26769

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Artificial IntelligenceOct-29-2025

PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings

Kim, Junseo, Han, Jongwook, Choi, Dongmin, Yoon, Jongwook, Lee, Eun-Ju, Jo, Yohan

Visual persuasion, which uses visual elements to influence cognition and behaviors, is crucial in fields such as advertising and political communication. With recent advancements in artificial intelligence, there is growing potential to develop persuasive systems that automatically generate persuasive images tailored to individuals. However, a significant bottleneck in this area is the lack of comprehensive datasets that connect the persuasiveness of images with the personal information about those who evaluated the images. To address this gap and facilitate technological advancements in personalized visual persuasion, we release the Personalized Visual Persuasion (PVP) dataset, comprising 28,454 persuasive images across 596 messages and 9 persuasion strategies. Importantly, the PVP dataset provides persuasiveness scores of images evaluated by 2,521 human annotators, along with their demographic and psychological characteristics (personality traits and values). We demonstrate the utility of our dataset by developing a persuasive image generator and an automated evaluator, and establish benchmark baselines. Our experiments reveal that incorporating psychological characteristics enhances the generation and evaluation of persuasive images, providing valuable insights for personalized visual persuasion.

large language model, machine learning, natural language, (17 more...)

2506.00481

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Media (1.00)
Health & Medicine > Consumer Health (1.00)
Government > Military (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Menke, Antonio-Gabriel Chacón, Tan, Phan Xuan, Kamioka, Eiji

Annotating the Chain-of-Thought: A Behavior-Labeled Dataset for AI Safety

arXiv.org Artificial IntelligenceOct-22-2025

Recent work has highlighted the importance of monitoring chain-of-thought reasoning for AI safety; however, current approaches that analyze textual reasoning steps can miss subtle harmful patterns and may be circumvented by models that hide unsafe reasoning. We present a sentence-level labeled dataset that enables activation-based monitoring of safety behaviors during LLM reasoning. Our dataset contains reasoning sequences with sentence-level annotations of safety behaviors such as expression of safety concerns or speculation on user intent, which we use to extract steering vectors for detecting and influencing these behaviors within model activations. The dataset fills a key gap in safety research: while existing datasets label reasoning holistically, effective application of steering vectors for safety monitoring could be improved by identifying precisely when specific behaviors occur within reasoning chains. We demonstrate the dataset's utility by extracting representations that both detect and steer safety behaviors in model activations, showcasing the potential of activation-level techniques for improving safety oversight on reasoning. Content Warning: This paper discusses AI safety in the context of harmful prompts and may contain references to potentially harmful content.

large language model, machine learning, natural language, (19 more...)

2510.18154

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)

Neural Information Processing SystemsOct-11-2025, 00:22:54 GMT

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM.

llama-2-7b-chat-hf, target behavior, vector, (16 more...)

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.04)
North America > United States > District of Columbia > Washington (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Government (1.00)
Media > News (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-6-2025

Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement

Wang, Anyi, Wu, Xuansheng, Shu, Dong, Ma, Yunpu, Liu, Ninghao

Steering has emerged as a promising approach in controlling large language models (LLMs) without modifying model parameters. However, most existing steering methods rely on large-scale datasets to learn clear behavioral information, which limits their applicability in many real-world scenarios. The steering vectors extracted from small dataset often contain task-irrelevant noising features, which degrades their effectiveness. To refine the steering vectors learned from limited data, we introduce Refinement of Steering V ector via Sparse Autoencoder (SAE-RSV) that leverages SAEs to semantically denoise and augment the steering vectors. In our framework, we first remove task-irrelevant features according to their semantics provided by SAEs, and then enrich task-relevant features missing from the small dataset through their semantic similarity to the identified relevant features. Extensive experiments demonstrate that the proposed SAE-RSV substantially outperforms all the baseline methods including supervised fine-tuning. Our findings show that effective steering vector can be constructed from limited training data by refining the original steering vector through SAEs. Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing tasks.

large language model, machine learning, natural language, (19 more...)

2509.23799

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceSep-23-2025

AutiHero: Leveraging Generative AI in Social Narratives to Engage Parents in Story-Driven Behavioral Guidance for Autistic Children

Lee, Jungeun, Lee, Kyungah, Hwang, Inseok, Park, SoHyun, Kim, Young-Ho

Social narratives are known to help autistic children understand and navigate social situations through stories. To ensure effectiveness, however, the materials need to be customized to reflect each child's unique behavioral context, requiring considerable time and effort for parents to practice at home. We present AutiHero, a generative AI-based social narrative system for behavioral guidance, which supports parents to create personalized stories for their autistic children and read them together. AutiHero generates text and visual illustrations that reflect their children's interests, target behaviors, and everyday contexts. In a two-week deployment study with 16 autistic child-parent dyads, parents created 218 stories and read an average of 4.25 stories per day, demonstrating a high level of engagement. AutiHero also provided an effective, low-demanding means to guide children's social behaviors, encouraging positive change. We discuss the implications of generative AI-infused tools to empower parents in guiding their children's behaviors, fostering their social learning.

autihero, machine learning, natural language, (13 more...)

2509.17608

Country: Asia (0.46)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)
Education > Focused Education > Special Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.91)